Default-all is dangerous!

نویسندگان

  • Wolfgang Gatterbauer
  • Alexandra Meliou
  • Dan Suciu
چکیده

We show that the default-all propagation scheme for database annotations is dangerous. Dangerous here means that it can propagate annotations to the query output which are semantically irrelevant to the query the user asked. This is the result of considering all relationally equivalent queries and returning the union of their where-provenance in an attempt to define a propagation scheme that is insensitive to query rewriting. We propose an alternative query-rewrite-insensitive (QRI) where-provenance called minimal propagation. It is analogous to the minimal witness basis for whyprovenance, straight-forward to evaluate, and returns all relevant and only relevant annotations. 1 Query-Rewrite-Insensitive provenance Provenance is sensitive to query rewriting unless carefully defined. Sensitive here means that the returned provenance may be different for a relationally equivalent query (we focus exclusively on conjunctive queries under set semantics). This is surprising at first since we are accustomed to leaving it to the database engine to choose the simplest relationally equivalent query to return our results. If we also consider provenance, then we are not guaranteed to get the provenance output we expect. With this argumentation, Buneman et al. [2] proposed that it is important to find a clean semantics for provenance that guarantees to give the same result for relationally equivalent queries. At least two well-known queryrewrite-insensitive (QRI) versions have been defined: Buneman et al. [2] defined the minimal witness basis for why-provenance, and Bhagwat et al. [1] defined the default-all propagation scheme for where-provenance. Our goal with this paper is to point to some semantic problems with the way the QRI property is achieved by default-all propagation. We also show how to fix these problems with an alternative propagation scheme. Why-provenance Where-provenance Naive witness [2] “SQL interpretation” Standard witness basis (αw) [2] propagation (αp) [3] QRI minimal witness basis (αm w ) [2] default-all propagation (αd p ) [1] Figure 1: Particular definitions (naive, standard, QRI) for whyand where-provenance considered in this paper. Due to space constraints and in order to keep this paper to the point, we will assume basic familiarity of the reader with the provenance concepts given in Fig. 1 and not repeat their formal definitions. Instead, we refer to the detailed survey of Cheney et al. [4] from which we also borrow the running example of Fig. 2 and Fig. 4 (and the milk example after giving a real-world interpretation to the annotations). Appendix A summarizes the notation used throughout this paper. 2 The minimal witness basis as QRI why-provenance Why-Provenance identifies witness tuples: “What input tuples contribute to the presence of each output tuple?” A witness is subset of the input tuples that is sufficient to ensure that a given output tuple t appears in the result of a query. This definition implies that the whole database is a witness as it is sufficient for t to appear. The witness basis or why-provenance αw(t,Q) is a subset of only relevant witnesses where the definition by Buneman et al. [2] makes precise what “relevant” means. Intuitively, those tuples that have been involved in some operation during query evaluation are part of the witness basis. It turns out that why-provenance is not QRI, and relationally equivalent queries may have different witness bases. Buneman et al. [2] showed that a subset of the witness basis, called the minimal witness basis and written here as αm w (t,Q), is invariant under rewriting. It con1 ar X iv :1 10 5. 43 95 v1 [ cs .D B ] 2 3 M ay 2 01 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Default Barrier and Optimal Cutoff Rate in KMV Structural Model based on the best Ranking of Companies

According to the adverse consequences that are brought by financial distress for companies, economy and financial –monetary institutions, the use of methods that can predict the occurrence of financial failure and prevent the loss of wealth is of great importance. The major models of credit risk assessment are based on retrospective information and using the methods which use the updated market...

متن کامل

An Application of Genetic Network Programming Model for Pricing of Basket Default Swaps (BDS)

The credit derivatives market has experienced remarkable growth over the past decade. As such, there is a growing interest in tools for pricing of the most prominent credit derivative, the credit default swap (CDS). In this paper, we propose a heuristic algorithm for pricing of basket default swaps (BDS). For this purpose, genetic network programming (GNP), which is one of the recent evolutiona...

متن کامل

Investigating The Asymmetric Effects of Macroeconomic Variables on Bank Default Rates During High and Low Default Periods

In recent decades, the high rate of inflation has been one of the concerns of Iran's economy, and one of the main causes of inflation has been the imbalance of banks. The level of non-current claims of banks has been increasing due to the economic recession, credit facilities and the lack of optimal allocation of facilities, and therefore it has unbalanced the balance sheets of banks, hence the...

متن کامل

The Effect of Macroeconomic Variables on Credit Default Cycles in the Country's Monetary Market

 The main challenge facing the country's banking system is credit default or the possibility of defaulting borrowers from fulfilling their obligations to the banking system, known as credit risk. Therefore to control credit risk, the factors influencing this type of risk must be identified. Several factors affect credit default in the non-government sector. This study examines the asymmetric ef...

متن کامل

Dependence of Default Probability and Recovery Rate in Structural Credit Risk Models: Empirical Evidence from Greece

The main idea of this paper is to study the dependence between the probability of default and the recovery rate on credit portfolio and to seek empirically this relationship. We examine the dependence between PD and RR by theoretical approach. For the empirically methodology, we use the bootstrapped quantile regression and the simultaneous quantile regression. These methods allow to determinate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1105.4395  شماره 

صفحات  -

تاریخ انتشار 2011